Testing Distribution Identity Efficiently
نویسنده
چکیده
We consider the problem of testing distribution identity. Given a sequence of independent samples from an unknown distribution on a domain of size n, the goal is to check if the unknown distribution approximately equals a known distribution on the same domain. While Batu, Fortnow, Fischer, Kumar, Rubinfeld, and White (FOCS 2001) proved that the sample complexity of the problem is Õ( √ n ·poly(1/ε)), the running time of their tester is much higher: O(n)+ Õ( √ n · poly(1/ε)). We modify their tester to achieve a running time of Õ( √ n ·poly(1/ε)). Let p and q be two probability distributions on [n]1, and let ‖p− q‖1 denote the l1-distance between p and q. In this paper, algorithms have access to two distributions q and p. • The distribution p is known: for each i ∈ [n], the algorithm can query the probability pi of i in constant time. • The distribution q is unknown: the algorithm can only obtain an independent sample from q in constant time. An identity tester is an algorithm such that: • if p = q, then it accepts with probability 2/3, • if ‖p−q‖1 ≥ ε, then it rejects with probability 2/3. Batu, Fortnow, Fischer, Kumar, Rubinfeld, and White [BFF+01] proved that there is an identity tester that uses only Õ( √ n · poly(1/ε)) samples from q. A shortcoming of their algorithm is a running time of O(n)+ Õ( √ n ·poly(1/ε)). In this note, we show that their tester can be modified to achieve a running time of Õ( √ n ·poly(1/ε)). It is also well known that Ω(√n) samples are required to tell the uniform distribution on [n] from a distribution that is uniform on a random subset of [n] of size n/2. 1 The Original Tester We now describe the tester of Batu et al. [BFF+01], which is outlined as Algorithm 1. Let ε′ = ε/C, where C is a sufficiently large positive constant. The tester starts by partitioning the set [n] into k + 1 =
منابع مشابه
Sample-Optimal Identity Testing with High Probability
We study the problem of testing identity against a given distribution (a.k.a. goodness-of-fit) with a focus on the high confidence regime. More precisely, given samples from an unknown distribution p over n elements, an explicitly given distribution q, and parameters 0 < ε, δ < 1, we wish to distinguish, with probability at least 1 − δ, whether the distributions are identical versus ε-far in to...
متن کامل.1 Error
Randomized algorithms have an additional primitive operation that deterministic algorithms do not have. We can select a number from a range [1 . . .x] uniformly at random, at a cost assumed to be linearly dependent on the size of x in binary representation. The algorithm then makes a decision based on the outcome of this random selection. We first look at some defining characteristics of random...
متن کاملThe Efficiency of Quantum Identity Testing of Multiple States
We examine two quantum operations, the Permutation Test and the Circle Test, which test the identity of n quantum states. These operations naturally extend the well-studied Swap Test on two quantum states. We first show the optimality of the Permutation Test for any input size n as well as the optimality of the Circle Test for three input states. In particular, when n = 3, we present a semi-cla...
متن کاملApproximation of Uniform and Product Distributions, k-Restrictions, Group Testing
We generalize the method that constructs k-restrictions by solving Set-Cover on a k-independent sample space (construction #3 that was shown in class), by using any k-wise efficiently approximatable distribution with ε-density. We then apply this generalized construction to solve the Group Testing problem. We improve upon the constructions shown in class by getting rid of the ek factor, which c...
متن کاملON THE POWER FUNCTION OF THE LRT AGAINST ONE-SIDED AND TWO-SIDED ALTERNATIVES IN BIVARIATE NORMAL DISTRIBUTION
This paper addresses the problem of testing simple hypotheses about the mean of a bivariate normal distribution with identity covariance matrix against restricted alternatives. The LRTs and their power functions for such types of hypotheses are derived. Furthermore, through some elementary calculus, it is shown that the power function of the LRT satisfies certain monotonicity and symmetry p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/0910.3243 شماره
صفحات -
تاریخ انتشار 2009